Deep Metric Learning (DML) is a group of techniques that aim to measure the similarity between objects through the neural network. Although the number of DML methods has rapidly increased in recent years, most previous studies cannot effectively handle noisy data, which commonly exists in practical applications and often leads to serious performance deterioration. To overcome this limitation, in this paper, we build a connection between noisy samples and hard samples in the framework of self-paced learning, and propose a \underline{B}alanced \underline{S}elf-\underline{P}aced \underline{M}etric \underline{L}earning (BSPML) algorithm with a denoising multi-similarity formulation, where noisy samples are treated as extremely hard samples and adaptively excluded from the model training by sample weighting. Especially, due to the pairwise relationship and a new balance regularization term, the sub-problem \emph{w.r.t.} sample weights is a nonconvex quadratic function. To efficiently solve this nonconvex quadratic problem, we propose a doubly stochastic projection coordinate gradient algorithm. Importantly, we theoretically prove the convergence not only for the doubly stochastic projection coordinate gradient algorithm, but also for our BSPML algorithm. Experimental results on several standard data sets demonstrate that our BSPML algorithm has better generalization ability and robustness than the state-of-the-art robust DML approaches.
translated by 谷歌翻译
High-quality traffic flow generation is the core module in building simulators for autonomous driving. However, the majority of available simulators are incapable of replicating traffic patterns that accurately reflect the various features of real-world data while also simulating human-like reactive responses to the tested autopilot driving strategies. Taking one step forward to addressing such a problem, we propose Realistic Interactive TrAffic flow (RITA) as an integrated component of existing driving simulators to provide high-quality traffic flow for the evaluation and optimization of the tested driving strategies. RITA is developed with fidelity, diversity, and controllability in consideration, and consists of two core modules called RITABackend and RITAKit. RITABackend is built to support vehicle-wise control and provide traffic generation models from real-world datasets, while RITAKit is developed with easy-to-use interfaces for controllable traffic generation via RITABackend. We demonstrate RITA's capacity to create diversified and high-fidelity traffic simulations in several highly interactive highway scenarios. The experimental findings demonstrate that our produced RITA traffic flows meet all three design goals, hence enhancing the completeness of driving strategy evaluation. Moreover, we showcase the possibility for further improvement of baseline strategies through online fine-tuning with RITA traffic flows.
translated by 谷歌翻译
如今,自定进度的学习(SPL)是模仿人类和动物的认知过程的重要机器学习范式。 SPL制度涉及自定进度的正常化程序和逐渐增加的年龄参数,该参数在SPL中起着关键作用,但最佳地终止此过程仍然是不平凡的。一个自然的想法是计算解决方案路径W.R.T.年龄参数(即年龄 - 路径)。但是,当前的年龄段算法要么仅限于最简单的正常器,要么缺乏牢固的理论理解以及计算效率。为了应对这一挑战,我们提出了一个小说\下划线{g} Eneralized \ suespline {ag} e-path \ usewassline {a} lgorithm(gaga)spl,用于带基于普通微分方程(ODES)的各种自定为常规器的SPL,并设置控制,可以学习整个解决方案频谱W.R.T.一系列年龄参数。据我们所知,GAGA是第一个确切的途径遵循算法,该算法可以解决一般的自定为常规器的年龄段。最后,详细描述了经典SVM和Lasso的算法步骤。我们证明了GAGA在现实世界数据集上的性能,并在我们的算法和竞争基线之间找到相当大的加速。
translated by 谷歌翻译
稀疏性损失最小化问题在包括机器学习,数据挖掘和现代统计的各个领域中起着重要作用。近端梯度下降法和坐标下降法是解决最小化问题的最流行方法。尽管现有方法可以实现隐式模型识别,但在有限数量的迭代中,也就是支持集合识别,但在高维情况下,这些方法仍然遭受巨大的计算成本和内存负担。原因是这些方法中的支持集识别是隐式的,因此无法明确识别实践中的低复杂性结构,即,它们无法通过降低尺寸丢弃相关特征的无用系数,以实现算法加速。为了应对这一挑战,我们提出了一种新颖的加速双随机梯度下降(ADSGD)方法,用于稀疏性损失最小化问题,这可以通过在优化过程中消除无效系数来减少块迭代次数的数量,并最终实现更快的显式模型识别和改进的模型识别和改进和改进的模型识别和改进速度算法效率。从理论上讲,我们首先证明ADSGD可以达到线性收敛速率并降低总体计算复杂性。更重要的是,我们证明ADSGD可以实现显式模型识别的线性速率。从数值上讲,基准数据集上的实验结果证实了我们提出的方法的效率。
translated by 谷歌翻译
学习提高AUC性能是机器学习中的重要主题。但是,AUC最大化算法可能会由于嘈杂数据而降低泛化性能。自定进度学习是处理嘈杂数据的有效方法。但是,现有的自定进度学习方法仅限于指尖学习,而AUC最大化是一个成对的学习问题。为了解决这个具有挑战性的问题,我们创新提出了一种平衡的自定进度的AUC最大化算法(BSPAUC)。具体而言,我们首先为自节奏的AUC提供了一个统计目标。基于此,我们提出了我们的自进度的AUC最大化公式,其中新型平衡的自定进定的正则化项被嵌入,以确保所选的阳性和负样品具有适当的比例。特别是,关于所有重量变量的子问题在我们的配方中可能是非凸,而通常在现有的自节奏问题中是凸出的。为了解决这个问题,我们提出了一种双环块坐标下降法。更重要的是,我们证明,相对于所有重量变量的子问题基于封闭形式的溶液会收敛到固定点,并且我们的BSPAUC在轻度假设下收敛到我们固定优化目标的固定点。考虑到基于深度学习和基于内核的实现,几个大规模数据集的实验结果表明,与现有的最新AUC最大化方法相比,我们的BSPAUC具有更好的概括性能。
translated by 谷歌翻译
最近的研究表明,深度神经网络(DNNS)极易受到精心设计的对抗例子的影响。对那些对抗性例子的对抗性学习已被证明是防御这种攻击的最有效方法之一。目前,大多数现有的对抗示例生成方法基于一阶梯度,这几乎无法进一步改善模型的鲁棒性,尤其是在面对二阶对抗攻击时。与一阶梯度相比,二阶梯度提供了相对于自然示例的损失格局的更准确近似。受此启发的启发,我们的工作制作了二阶的对抗示例,并使用它们来训练DNNS。然而,二阶优化涉及Hessian Inverse的耗时计算。我们通过将问题转换为Krylov子空间中的优化,提出了一种近似方法,该方法显着降低了计算复杂性以加快训练过程。在矿工和CIFAR-10数据集上进行的广泛实验表明,我们使用二阶对抗示例的对抗性学习优于其他FISRT-阶方法,这可以改善针对广泛攻击的模型稳健性。
translated by 谷歌翻译
二重性优化已应用于各种机器学习模型。近年来已经开发了许多随机的二元优化算法。但是,他们中的大多数都限制了他们对单机器设置的关注,因此他们无法处理分布式数据。为了解决这个问题,在所有参与者组成网络并在该网络中执行点对点通信的设置,我们基于梯度跟踪通信机制和两个不同的梯度估计器开发了两个新颖的分布式随机双光线优化算法。此外,我们证明他们可以实现$ o(\ frac {1} {\ epsilon^{2} {2}(1- \ lambda)^2})$和$ o(\ frac {1} {\ epsilon^{3/ 2}(1- \ lambda)^2})$收敛率分别以获取$ \ epsilon $ - 准确解决方案,其中$ 1- \ lambda $表示通信网络的频谱差距。据我们所知,这是实现这些理论结果的第一项工作。最后,我们将算法应用于实用的机器学习模型,实验结果证实了我们算法的功效。
translated by 谷歌翻译
在本文中,我们提出了一种新的Hessian逆自由单环算法(FSLA),用于彼此优化问题。 Bilevel优化的经典算法承认计算昂贵的双回路结构。最近,已经提出了几种单循环算法,其具有优化内部和外部变量。但是,这些算法尚未实现完全单循环。因为它们忽略了评估给定内部和外部状态的超梯度所需的循环。为了开发一个完全单环算法,我们首先研究超梯度的结构,并识别超梯度计算的一般近似配方,这些计算包括几种先前的常见方法,例如,通过时间,共轭渐变,\ emph {等}基于此配方,介绍一个新的状态变量来维护历史超梯度信息。将我们的新配方与内外变量的替代更新相结合,我们提出了一种高效的全循环算法。理论上我们可以显示新状态生成的错误可以界限,我们的算法收敛于$ O(\ epsilon ^ {-2})$。最后,我们通过基于多个Bilevel优化的机器学习任务验证了我们验证的算法。
translated by 谷歌翻译
最近的研究表明,神经组合优化(NCO)在许多组合优化问题(如路由)中具有优于传统算法的优点,但是对于涉及相互条件的动作空间的包装,诸如打包的更加复杂的优化任务的效率较低。在本文中,我们提出了一种经常性的条件查询学习(RCQL)方法来解决2D和3D包装问题。我们首先通过经常性编码器嵌入状态,然后采用先前操作的条件查询注意。条件查询机制填充了学习步骤之间的信息差距,将问题塑造为Markov决策过程。从复发中受益,单个RCQL模型能够处理不同尺寸的包装问题。实验结果表明,RCQL可以有效地学习用于离线和在线条带包装问题(SPP)的强烈启发式,优于空间利用率范围广泛的基线。 RCQL与最先进的方法相比,在离线2D 40盒案例中将平均箱间隙比率降低1.83%,3.84%。同时,我们的方法还实现了5.64%的空间利用率,对于1000件物品的空间利用率比现有技术更高。
translated by 谷歌翻译
Large training data and expensive model tweaking are standard features of deep learning for images. As a result, data owners often utilize cloud resources to develop large-scale complex models, which raises privacy concerns. Existing solutions are either too expensive to be practical or do not sufficiently protect the confidentiality of data and models. In this paper, we study and compare novel \emph{image disguising} mechanisms, DisguisedNets and InstaHide, aiming to achieve a better trade-off among the level of protection for outsourced DNN model training, the expenses, and the utility of data. DisguisedNets are novel combinations of image blocktization, block-level random permutation, and two block-level secure transformations: random multidimensional projection (RMT) and AES pixel-level encryption (AES). InstaHide is an image mixup and random pixel flipping technique \cite{huang20}. We have analyzed and evaluated them under a multi-level threat model. RMT provides a better security guarantee than InstaHide, under the Level-1 adversarial knowledge with well-preserved model quality. In contrast, AES provides a security guarantee under the Level-2 adversarial knowledge, but it may affect model quality more. The unique features of image disguising also help us to protect models from model-targeted attacks. We have done an extensive experimental evaluation to understand how these methods work in different settings for different datasets.
translated by 谷歌翻译